218 research outputs found

    Silent speech: restoring the power of speech to people whose larynx has been removed

    Get PDF
    Every year, some 17,500 people in Europe and North America lose the power of speech after undergoing a laryngectomy, normally as a treatment for throat cancer. Several research groups have recently demonstrated that it is possible to restore speech to these people by using machine learning to learn the transformation from articulator movement to sound. In our project articulator movement is captured by a technique developed by our collaborators at Hull University called Permanent Magnet Articulography (PMA), which senses the changes of magnetic field caused by movements of small magnets attached to the lips and tongue. This solution, however, requires synchronous PMA-and-audio recordings for learning the transformation and, hence, it cannot be applied to people who have already lost their voice. Here we propose to investigate a variant of this technique in which the PMA data are used to drive an articulatory synthesiser, which generates speech acoustics by simulating the airflow through a computational model of the vocal tract. The project goals, participants, current status, and achievements of the project are discussed below.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Reporting back environmental exposure data and free choice learning.

    Get PDF
    Reporting data back to study participants is increasingly being integrated into exposure and biomonitoring studies. Informal science learning opportunities are valuable in environmental health literacy efforts and report back efforts are filling an important gap in these efforts. Using the University of Arizona's Metals Exposure Study in Homes, this commentary reflects on how community-engaged exposure assessment studies, partnered with data report back efforts are providing a new informal education setting and stimulating free-choice learning. Participants are capitalizing on participating in research and leveraging their research experience to meet personal and community environmental health literacy goals. Observations from report back activities conducted in a mining community support the idea that reporting back biomonitoring data reinforces free-choice learning and this activity can lead to improvements in environmental health literacy. By linking the field of informal science education to the environmental health literacy concepts, this commentary demonstrates how reporting data back to participants is tapping into what an individual is intrinsically motivated to learn and how these efforts are successfully responding to community-identified education and research needs

    A silent speech system based on permanent magnet articulography and direct synthesis

    Get PDF
    In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

    Non-Parallel Articulatory-to-Acoustic Conversion Using Multiview-based Time Warping

    Get PDF
    This work was supported in part by the Spanish State Research Agency (SRA) grant number PID2019-108040RB-C22/SRA/10.13039/501100011033, and the FEDER/Junta de AndalucíaConsejería de Transformación Económica, Industria, Conocimiento y Universidades project no. B-SEJ-570-UGR20.In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario.Spanish State Research Agency (SRA) PID2019-108040RB-C22/SRA/10.13039/501100011033FEDER/Junta de AndalucíaConsejería de Transformación Económica, Industria, Conocimiento y Universidades project no. B-SEJ-570-UGR20

    Integrating user-centred design in the development of a silent speech interface based on permanent magnetic articulography

    Get PDF
    Abstract: A new wearable silent speech interface (SSI) based on Permanent Magnetic Articulography (PMA) was developed with the involvement of end users in the design process. Hence, desirable features such as appearance, port-ability, ease of use and light weight were integrated into the prototype. The aim of this paper is to address the challenges faced and the design considerations addressed during the development. Evaluation on both hardware and speech recognition performances are presented here. The new prototype shows a com-parable performance with its predecessor in terms of speech recognition accuracy (i.e. ~95% of word accuracy and ~75% of sequence accuracy), but significantly improved appearance, portability and hardware features in terms of min-iaturization and cost

    Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

    Get PDF
    Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniquesso far adopt a supervised learning framework, in which timesynchronousarticulatory-and-speech recordings are used to train a supervised machine learning algorithm that can be used later to map articulator movements to speech. This, however, prevents the application of A2A techniques in cases where parallel data is unavailable, e.g., a person has already lost her/his voice and only articulatory data can be captured. In this work, we propose a solution to this problem based on the theory of multi-view learning. The proposed algorithm attempts to find an optimal temporal alignment between pairs of nonaligned articulatory-and-acoustic sequences with the same phonetic content by projecting them into a common latent space where both views are maximally correlated and then applying dynamic time warping. Several variants of this idea are discussed and explored. We show that the quality of speech generated in the non-aligned scenario is comparable to that obtained in the parallel scenario.This work was funded by the Spanish State Research Agency (SRA) under the grant PID2019-108040RBC22/ SRA/10.13039/501100011033. Jose A. Gonzalez-Lopez holds a Juan de la Cierva-Incorporation Fellowship from the Spanish Ministry of Science, Innovation and Universities (IJCI-2017-32926)

    Cascaded- and Modular-Multilevel Converter Laboratory Test System Options: A Review

    Get PDF
    The increasing importance of cascaded multilevel converters (CMCs), and the sub-category of modular multilevel converters (MMCs), is illustrated by their wide use in high voltage DC connections and in static compensators. Research is being undertaken into the use of these complex pieces of hardware and software for a variety of grid support services, on top of fundamental frequency power injection, requiring improved control for non-traditional duties. To validate these results, small-scale laboratory hardware prototypes are often required. Such systems have been built by many research teams around the globe and are also increasingly commercially available. Few publications go into detail on the construction options for prototype CMCs, and there is a lack of information on both design considerations and lessons learned from the build process, which will hinder research and the best application of these important units. This paper reviews options, gives key examples from leading research teams, and summarizes knowledge gained in the development of test rigs to clarify design considerations when constructing laboratory-scale CMCs.This work was supported in part by The University of Manchester supported by the National Innovation Allowance project ``VSC-HVDC Model Validation and Improvement'' and Dr. Heath's iCASE Ph.D. studentship supported through Engineering and Physical Sciences Research Council (EPSRC) and National Grid, in part by the Imperial College London supported by EPSRC through the HubNet Extension under Grant EP/N030028/1, in part by an iCASE Ph.D. Studentship supported by EPSRC and EDF Energy and the CDT in Future Power Networks under Grant EP/L015471/1, in part by University of New South Wales (UNSW) supported by the Solar Flagships Program through the Education Infrastructure Fund (EIF), in part by the Australian Research Council through the Discovery Early Career Research Award under Grant DECRA_DE170100370, in part by the Basque Government through the project HVDC-LINK3 under Grant ELKARTEK KK-2017/00083, in part by the L2EP research group at the University of Lille supported by the French TSO (RTE), and in part by the Hauts-de-France region of France with the European Regional Development Fund under Grant FEDER 17007725

    Real-Time Detection and Rapid Multiwavelength Follow-up Observations of a Highly Subluminous Type II-P Supernova from the Palomar Transient Factory Survey

    Full text link
    The Palomar Transient Factory (PTF) is an optical wide-field variability survey carried out using a camera with a 7.8 square degree field of view mounted on the 48-in Oschin Schmidt telescope at Palomar Observatory. One of the key goals of this survey is to conduct high-cadence monitoring of the sky in order to detect optical transient sources shortly after they occur. Here, we describe the real-time capabilities of the PTF and our related rapid multiwavelength follow-up programs, extending from the radio to the gamma-ray bands. We present as a case study observations of the optical transient PTF10vdl (SN 2010id), revealed to be a very young core-collapse (Type II-P) supernova having a remarkably low luminosity. Our results demonstrate that the PTF now provides for optical transients the real-time discovery and rapid-response follow-up capabilities previously reserved only for high-energy transients like gamma-ray bursts.Comment: ApJ, in press; all spectroscopic data available from the Weizmann Institute of Science Experimental Astrophysics Spectroscopy System (WISEASS; http://www.weizmann.ac.il/astrophysics/wiseass/
    corecore